1. Face recognition

1.a. Eigenfaces

In this section first 6 eigenfaces for each subject are explored.

Subject 01

Subject 02

The eigenfaces seem to be a combination of different faces which is inline with the intuition that each projection axis is a linear combination of original axes. The expression seems to become more specific the higher the order of eigenface is. Shadows are also more pronounce in the first few eigenfaces further reinforcing the fact that the first eigenfaces are supposed to capture the highest variation in the data. The glasses are more pronounce in the subject 2 eigenfaces due to the proportion of the images wearing them.

1.b. Face recognition via projection errors to 1st eigenfaces

s_ij indicates projection residual of subject j with eigenface i. Face recognition is done based on the minimum projection residual to a a particular eigenface (the closest eigenface). In this case, both test subjects are recognized correctly (subject 1 is closer to eigenface 1 and subject 2 is closer to eigenface 2).

2. ISOMAP

2.a. Adjacency matrix

In the following procedure, adjacency matrix is defined as a matrix A with each entry A_ij corresponding to a_ij = euclidean distance between datapoint i and datapoint j if a_ij <= epsilon and 0 otherwise (defined as unconnected). Here, epsilon value of 12 is used.

2.b. Visualization of ISOMAP result

In the following procedure, run_isomap of Isomap class uses the same method as shown above to construct adjacency matrix and does the following in order:

Z is then plotted as the d-dimensional embedding (2 in this case), preserving the dissimilarities between datapoints. Image snippets are included at random to analyze whether their similarities are captured in the embedding.

Faces facing the same direction seem to be located in the same region, indicating that the adjacency matrix and dissimilarity matrix successfully capture relationship between datapoints and hence the embedding in Z successfully captures (dis)similarities between images in 2-dimensional space.

2.c. ISOMAP with l1 (Manhattan) distance

Now we repeat section 2 using l1-distance. First, epsilon is picked such that similar average number of neighbors is obtained (approximately epsilon of 460 results in on average 40 neighbors).

Adjacency mtrix looks similar to the one with l2-distance.

It can be seen that even though the embedding is different, similarities between images can be captured also with Manhattan distance as long as epsilon is chosen carefully to arrive at a reasonable number of neighbors.

2.d. Comparison with PCA

With PCA, grouping of similar faces still occurs. However, the fact that similarity is computed by projecting the data linearly, it is missing a more complicated relationship that makes ISOMAP more successful in grouping the directionality of the faces. For instance, the two leftmost images on the plot clearly indicates opposite facing direction. However, due to their similarity in pixel values (color), their linear projections are close to each other. Furthermore, correct clustering between left-facing images and right-facing images on the bottom left region of the plot seems to be absent.

ISOMAP with its graph-based similarity measure seems to correctly capture the difference between left-facing and right-facing images and correctly cluster them, resulting in a more meaningful grouping.

3. PCA: food consumption in European countries

3.a. Country similarity based on food consumption

As we are looking into similarities between countries, each country should be one datapoint. Therefore, the data matrix X is set up with columns corresponding to different foods and rows corresponding to country.

for comparison with skleaarn's PCA package:

Manual implementation is equivalent to the one with sklearn's PCA package (other than being flipped along y-axis). Is can be seen that the plot of 2 principal components groups countries geographically based on their similarities in food consumption. There seem to be 3 clusters separating the European countries based on their latitudes:

This also seems to point out that food consumption is naturally related to climate which perhaps dictates which crops dominate the region.

3.b. Food similarity based on consuming country

Since now we are looking for similarities between food items, the data matrix in the previous section needs to be transposed.

There again seem to be clusters in the food similarity plot. Garlic is often used together with olive oil for cooking. The bottom right seems to be hosting food with long shelf-life including frozen and canned food, yoghurt and potatoes. On the bottom left, we can see food items that are normally consumed together including fruits, butter and margerine with tea and biscuits. The clusters also point out foods that are seemingly used together in particular lifestyle, e.g. fruits + butter + margerine + biscuits + jam indicate baked goods and sweets (commonly consumed with tea in countries like England for example), garlic + olive oil are staples for many cuisines.